Goto

Collaborating Authors

 asymptotic guarantee


Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Neural Information Processing Systems

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g.\ Wasserstein generative adversarial networks, Wasserstein autoencoders). Emerging from computational optimal transport, the Sliced-Wasserstein (SW) distance has become a popular choice in MEDE thanks to its simplicity and computational benefits. While several studies have reported empirical success on generative modeling with SW, the theoretical properties of such estimators have not yet been established. In this study, we investigate the asymptotic properties of estimators that are obtained by minimizing SW. We first show that convergence in SW implies weak convergence of probability measures in general Wasserstein spaces. Then we show that estimators obtained by minimizing SW (and also an approximate version of SW) are asymptotically consistent. We finally prove a central limit theorem, which characterizes the asymptotic distribution of the estimators and establish a convergence rate of $\sqrt{n}$, where $n$ denotes the number of observed data points. We illustrate the validity of our theory on both synthetic data and neural networks.


Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

Neural Information Processing Systems

Minimum distance estimation (MDE) gained recent attention as a formulation of (implicit) generative modeling. It considers minimizing, over model parameters, a statistical distance between the empirical data distribution and the model. This formulation lends itself well to theoretical analysis, but typical results are hindered by the curse of dimensionality. To overcome this and devise a scalable finite-sample statistical MDE theory, we adopt the framework of smooth 1-Wasserstein distance (SWD) $\mathsf{W}_1^{(\sigma)}$. The SWD was recently shown to preserve the metric and topological structure of classic Wasserstein distances, while enjoying dimension-free empirical convergence rates. In this work, we conduct a thorough statistical study of the minimum smooth Wasserstein estimators (MSWEs), first proving the estimator's measurability and asymptotic consistency. We then characterize the limit distribution of the optimal model parameters and their associated minimal SWD. These results imply an $O(n^{-1/2})$ generalization bound for generative modeling based on MSWE, which holds in arbitrary dimension. Our main technical tool is a novel high-dimensional limit distribution result for empirical $\mathsf{W}_1^{(\sigma)}$. The characterization of a nondegenerate limit stands in sharp contrast with the classic empirical 1-Wasserstein distance, for which a similar result is known only in the one-dimensional case. The validity of our theory is supported by empirical results, posing the SWD as a potent tool for learning and inference in high dimensions.


Reviews: Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Neural Information Processing Systems

Clarity: the article is clear and well written, In this aspect the paper is an "accept" for me. This is an accept as well (6) Quality: this paper is of high quality, it is clear there is a significant research effort behind. The combination "theoretical results empirical validation in simple cases" is sensible given the type of paper this is, and the audience. Accept too (6) Originality: This is the item where I tend to reject more than to accept (5). I think it is definitely original, but all the theoretical contributions seem to me a bit marginal: I am very familiar with Bernton et al 2018, the paper that develops the technique (in turn, mainly based on Basseti et al 2006 and Pollard 1980) that is used here.


Reviews: Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Neural Information Processing Systems

The reviewers liked the paper and voted for an accept that was confirmed following authors feedback. But the discussion highlighted the fact that the result do not discuss the problem of sampling on the unit sphere that needs to be done when actually learning generative models. It will probably add some variance in practice and should be at least discussed in the final paper and investigated in future works.


Review for NeurIPS paper: Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

Neural Information Processing Systems

Additional Feedback: The list of remarks and questions I have: * The current standard for regularization of OT is entropic regularization of the plan (papers of Cuturi [5], and also sample complexity results [3,4]). This paper seems to mostly ignore this literature, which is quite weird, given the fact that the goals are (almost) the same. Given the fact that entropic regularization can (should) be viewed as a "cheap proxy" for Gaussian smoothing, a proper and detailed comparison seems in order. The authors seem to be using a re-sampling scheme "Sampling from P_n and N(sigma)_x0000_ and adding the obtained values produces samples from P_n * N(sigma)". But the potential problem is that to cope with the CoD, this might requires a number of samples exponential in the dimension.


Review for NeurIPS paper: Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

Neural Information Processing Systems

The reviewers agree that this is a good paper that deserves acceptance. The contributions are useful from a statistical point of view. They also agree that the computational limitations should be put more upfront: the idea of Gaussian smoothing has a limited interest for the neurips community unless one has an efficient algorithm to solve optimal transport between the smoothed densities, which is not the case yet (any method based purely on discretizations, as proposed here, inevitably suffers from the curse of dimensionality). The authors mention in the rebuttal that an idea is to parameterize the dual variable with a neural network, but this leads to an object that is very different from SWD since neural networks have inductive biases. For these reasons, I recommend accept (poster).


Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Neural Information Processing Systems

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g.\ Wasserstein generative adversarial networks, Wasserstein autoencoders). Emerging from computational optimal transport, the Sliced-Wasserstein (SW) distance has become a popular choice in MEDE thanks to its simplicity and computational benefits. While several studies have reported empirical success on generative modeling with SW, the theoretical properties of such estimators have not yet been established. In this study, we investigate the asymptotic properties of estimators that are obtained by minimizing SW. We first show that convergence in SW implies weak convergence of probability measures in general Wasserstein spaces.


Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance

Neural Information Processing Systems

Minimum distance estimation (MDE) gained recent attention as a formulation of (implicit) generative modeling. It considers minimizing, over model parameters, a statistical distance between the empirical data distribution and the model. This formulation lends itself well to theoretical analysis, but typical results are hindered by the curse of dimensionality. To overcome this and devise a scalable finite-sample statistical MDE theory, we adopt the framework of smooth 1-Wasserstein distance (SWD) \mathsf{W}_1 {(\sigma)} . The SWD was recently shown to preserve the metric and topological structure of classic Wasserstein distances, while enjoying dimension-free empirical convergence rates.


Gradients should stay on Path: Better Estimators of the Reverse- and Forward KL Divergence for Normalizing Flows

Vaitl, Lorenz, Nicoli, Kim A., Nakajima, Shinichi, Kessel, Pan

arXiv.org Artificial Intelligence

We propose an algorithm to estimate the path-gradient of both the reverse and forward Kullback-Leibler divergence for an arbitrary manifestly invertible normalizing flow. The resulting path-gradient estimators are straightforward to implement, have lower variance, and lead not only to faster convergence of training but also to better overall approximation results compared to standard total gradient estimators. We also demonstrate that path-gradient training is less susceptible to mode-collapse. In light of our results, we expect that path-gradient estimators will become the new standard method to train normalizing flows for variational inference.


Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Nadjahi, Kimia, Durmus, Alain, Simsekli, Umut, Badeau, Roland

Neural Information Processing Systems

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g.\ Wasserstein generative adversarial networks, Wasserstein autoencoders). Emerging from computational optimal transport, the Sliced-Wasserstein (SW) distance has become a popular choice in MEDE thanks to its simplicity and computational benefits. While several studies have reported empirical success on generative modeling with SW, the theoretical properties of such estimators have not yet been established. In this study, we investigate the asymptotic properties of estimators that are obtained by minimizing SW. We first show that convergence in SW implies weak convergence of probability measures in general Wasserstein spaces.